home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
ftp.cs.arizona.edu
/
ftp.cs.arizona.edu.tar
/
ftp.cs.arizona.edu
/
icon
/
newsgrp
/
group03a.txt
/
000021_icon-group-sender_Thu Feb 27 12:24:36 2003.msg
< prev
next >
Wrap
Internet Message Format
|
2003-12-22
|
2KB
Return-Path: <icon-group-sender>
Received: (from root@localhost)
by baskerville.CS.Arizona.EDU (8.11.1/8.11.1) id h1RJNMc10038
for icon-group-addresses; Thu, 27 Feb 2003 12:23:22 -0700 (MST)
Message-Id: <200302271923.h1RJNMc10038@baskerville.CS.Arizona.EDU>
Subject: Help with high level guidance on text searching algorithms
To: icon-group@cs.arizona.edu
From: "David Gamey" <dgamey@ca.ibm.com>
Date: Thu, 27 Feb 2003 10:34:33 -0500
X-MIMETrack: Serialize by Router on D01ML391/01/M/IBM(Release 5.0.11 +SPRs MIAS5EXFG4, MIAS5AUFPV
and DHAG4Y6R7W, MATTEST |November 8th, 2002) at 02/27/2003 10:33:54 AM
Errors-To: icon-group-errors@cs.arizona.edu
Status: RO
Hi all,
I've been checking some links for different algorithms and how they apply
to different problems. They field has really exploded since the last time
I looked in detail. A quick poke about in the IPL didn't turn up anything.
I did find lots of detailed links (off the agrep site) but I really want
to see the forest (not the trees, branches and roots) right now. Perhaps
someone could give me a pointer or two.
The problem I'm looking at is related to searching through sets of text
looking for commonality (substrings not patterns - although that would be
of secondary interest). I'm not looking for optimal or minimal sets of
substrings but relatively good matches. Given a set of messages, I'd like
to be able to categorize them into sets that share common characteristics -
probable this, probable that. This has to be similar to things like search
engines and spam filters do.
Thanks, in advance.
David